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Abstract 



in 



We propose a class of nonparametric point estimators for = P(X < Y) for the 
case where (X, Y) are paired, possibly dependent, continuous random variables. We make 
use of the pairing structure for linking the estimation of 9 with the estimation of the sur- 
vival function and density function of Y — X. We consider the use of bootstrap to obtain 
confidence intervals for 6 based on the proposed estimators. The performance of these 
estimators is illustrated using simulated and real data. The example with real data shows 
that not accounting for pairing and dependence might lead to different conclusions about 
the relationship between X and Y. 
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1 Introduction 



The study of stress-strength models have received considerable attention for many years due 
to its applicability in diverse areas. The main interest in this kind of models is the quantity 
6 = P(X < Y), where X and Y are random variables. In medicine for example, if X and 
Y are the outcomes of a control and an experimental t reatment respectivel y, the parameter 6 
can be interpreted as the effectiveness of treatment Y (IVentura et all 1201 lb . This quantity is 



also related to the Re ceiver Opera ting Characteristic (ROC) curves, where 6 is interpreted as 
an index of accuracy (|Zhou[ l2008|k In engineering and reliability studies 9 is also a quantity of 
interest because it may represent the probab ility that the stren gth of a component (Y) exceeds 



the stress (X) coming from external factors (IKotz et al 



Stress-strength models were introduced by 



Birnbaum ( 



2003) 



19561) who proposed a nonparametric 



estimator for 6 based on the Mann- Whitney statistic for the case where X and Y are indepen- 
dent. There is a large amount of literature related to the study of point and interval estimation 



*Universidad de Sonora, Departamento de Matematicas. E-mail: montoya@mat.uson.mx 
^University of Warwick, Department of Statistics, Coventry, CV4 7AL, UK. E-mail: 
EJ.Rubio@warwick.ac.uk 



1 



of 9 using different approaches (see iKotz et a 
in the case where X a nd Y are independen t, 
using reference priors; 



Baklizi and Eidous 



20031 fo r a goo d survey on this). For instance, 
Sun et al.1 (| 1998b proposes a Bayesian approach 
propose an estimator based on kernel esti- 
mators of the densities of X and Y (wh ich can be s traightforwardly generalised to the use of 
other nonpara metric density est imators); Izhoul (120081) proposes the use of bootstrap and asy mp- 
totic intervals: I Jing et al.l (120091) es timate 9 using the empirical likelihood: LMontoyal (|2008[) and 
Diaz-Frances and Montoval J2012 ) propo se the use of the profile likelihood for conducting in- 



ference about 9; and 



Ventura et al 



(120111) propose the use of Bayesian inference with Jeffreys 
and matching priors as well as modified profile likelihoods for the cases where X and Y are 
normal or exponential random variables. 

It is importa nt to mention that the param eter 9 may not be avai l able i n a closed form in 



many cases (see lAzzalini and Chiognal 



2004 and 



Gupta and Brownl 



2001 



for an example of 



this). This makes difficult (if at all feasible) to find a reparameterisation involving 9, which 
complicates the use of the classical approach. In particul ar, the use of the profile likelihoo d 
might be difficult if this reparameterisation is not available (IDfaz-Frances and MontoyaLl2012r) . 



Alternative inferential approaches that overcome this difficulty are Bayesian inference, non- 
parametric estimation, and bootstrap; given that using these approaches it is possible to obtain 
bootstrap confi dence intervals and credib l e intervals from the corresponding s amples of 9 and 



2008 



Rubio and Steel 



20121) . 



9, respectively iBaklizi and EidousL 120061 : IZhoul . 

New interest has been focused on th e estimat i on of 9 in the case where X and Y are de 
pendent random v ariables. For example Barbierd (1201 lb assumes that (X, Y) are jointly nor 
mally distributed; 



Rubio and Steel 



(|2012l) suppose that X and Y are marginally distributed as 
skewed scale mix ture of normals and construct the corresponding joint distribution using a 
Gaussian Copula; iDomma and Giordanol (|2012ab construct the joint distribution of (X, Y) us- 
ing a Farlie-Gumbel-Morgenstern co pula with marginal distributions belonging to the Burr sys- 



tem; 



Domma and Giordano 



(|2012bl) consider Dagum distributed marginals and construct t heir 
joint distribution using a Frank copula; among others (|Nadarajahi 120051: iGupta et all 1201 2b . In 
these papers, the importance of taking the assumption of dependence between X and Y into 
account is illustrated using simulated and real data sets. 

We propose a class of nonparametric estimators of 9 for the case where (X, Y) are paired, 
possibly dependent, continuous random variables. This scenario is of interest giv en that paired 
obser vations are produced in many experimental designs (see e.g. ISprott , 



2000 and 



Cox and R eid. 



2000 for examples of this). The estimators proposed here are based on nonparametric estima- 
tors of the survival function and density function of Y — X. This approach avoids making 
distributional assumptions over (X, Y) and allows interval estimation of 9 via nonparamet- 
ric bootstrap. In addition, this method can be easily implemented in R using already existing 
packages. In Section [2] we introduce these estimators and discuss some of their properties. In 
Section [3] we present two examples, using simulated and real data, which illustrate the impor- 
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tance of accounting for pairing and dependence of the observations when conducting inference 
about 9. 



2 Nonparametric estimators of 6 

Let (X, Y) be a pair of continuous random variables. Let (x, y) be a sample from (X, Y) of 
size n and suppose that these observations are collected in couples (xi, j/j), i = 1, . . . , n. Define 
the variable Z = Y — X and the vector of differences z = y — x. By definition, we have that 



e = F(Z>0) = l-F z (0) = S z (0), 

where Fz and Sz are the cumulative distribution function and the survival function of Z, re- 
spectively. If F z or Sz are replaced by a nonparametric estimator, then we find an immediate 
connection between the nonparametric estimation of the cumulative distribution function (or 
the survival function) of Z and the nonparametric estimation of 9. Based on this, we propose 
the following algorithm for estimating 9. 



Algorithm 1 

1: Calculate the differences z = y — x. 

2: Using the sample z construct a nonparametric estimator F z of the distribution function of 
Z and define the estimator 9 — 1 — Fz(0). 



It is possible to define an alternative estimator of 9 in Step 2 of Algorithm Q] by construct- 
ing a nonparametric estimator fz of the density of Z, based on the sample z, and defining 
the estimator 9 = / °° fz{z)dz. Several nonparametric estimat ors Fz and fz can be consid- 
ered for this purpose. For instance, kernel density esti mators (IParzenl. 1 19621) . the empirical 



distribution function, shape-restric ted density estimators (ICule et al 



20111 : 



20101) and re cently pro- 



Rufibach, 



20121) . Note that 



posed smoothed versions of these (iDtimbeng and Rufibachi 
the asymptotic properties of the estimator 9 are inherited from those of the estimator Fz eval- 
uated at 0. For example, if we use the empirical distribution function for estimating F z (0), 
then we have that 9 ^4 ' 9 as n — > oo. The use of nonparametric bootstrap on the sample z 
together with Algori t hmQl allows us to obtain a variety of bootstrap confidence intervals for 9 
(IDiCiccio and Efron[ll996l) . 

Note that this class of estimators avoids making assumptions on the distribution of (X, Y) 
and the sort of dependence between the variables X and Y. The relationship between these 
variables, which can be either dependent or independent, is implicitly included by modelling 
the differences between the observations which only requires a pairing of the observations. 
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3 Examples 



In this section, we illustrate the implementation of the estimators proposed in Se ctional In the 
first e xample we use a sample simulated from a bivariate sinh-arcsinh distribution (|Jones and Pewseyl . 
2009 ). As detailed in Jones and Pewseyl ( 2009 ). this distribution contains parameters that con- 
trol skewness, kurtosis and correlation of the marginals. This example illustrates the influence 
of the assumptions of pairing and dependence on the bootstrap distributions of 9 in terms of 
their location and spread. In the second example we use a real data set and show that not in- 
cluding the assumptions of pairing and dependence may lead to opposite conclusions about the 
relationship between X and Y. 

In both examples, we consider the following 6 types of estimators of 9. Estimators based 
on Algorithm Q] with 9=1 — Fz(0): (1) The estimator "Kernel", based on a Gaussian kernel 
estimator of Fz; and (2) The estimator "ECDF", based on the empirical distribution function 
for estimating Fz- Estimators based on Algorithm \T\ with 9 = J °° fz(z)d z: (3) The estima - 



Cule et al. 



d201Ch : 



tor "MLE", where fz is the shape-restricted density estimator described in 
and (4) The estimator "SMLE", where f z is the smooth-shape-restricted density estimator pro 
posed in lDumbeng and Rufibachl (|201lb . For comparison purposes, we also consider two esti 
mators base d on the assumption of inde pendence of X and Y: (5) The estimator "Independent' 
proposed in 



Baklizi and Eidous 



(|2006|) . based on a Gaussian kernel estimator of the marginal 
densities of X and Y; and (6) The es timator "Paired", based on a Gaussian kernel estimator 
of the marginal densities of X and Y (|Baklizi and EidousL 120060 but taking the pairing of the 
observations into account in the bootstrap sampling. 

Nonparametri c density estimatio n is conducted using the R pa ckages 'LogConcDEAD' 
(ICule et all 12009b and 'logcondens' (|Dumbeng and Rufibachl . 1201 II) . Bootstrap samples and 
bootstrap confide nce intervals (Normal, B asic, Percentile and BCa) were obtained using the R 



packages 'boot' (ICanty and RipleyL 120121) and 'simpleboot' (IPengl . 
these examples is available upon request. 



2008). R source code for 



3.1 Simulated data 

In this example we use a simulat ed sample of size n = 100 from a bivariate sinh-arcsinh distri- 
bution (|J^ne£andT^ewse3|2^09j) with parameters (a l5 cr 2 , p, e i5 e 2 , Si, 5 2 ) = (1, 1, 0.75, 0, 1, 1, 2). 
Figure [Ik shows a contour plot of the corresponding density. This is a complex scenario where 
the entries present departure from normality and correlation. The population correlation co- 
efficient of this sample is 0.737 and the theoretical correlation is 0.743. The parameter 9 in 
this family of distributions is not generally tractable. The theoretical value of 9, obtained by 
numerical integration, is 0.78. Figured!) shows the bootstrap distribution of 9 using several 
nonparametric estimators. We can observe a considerable influence of the assumptions of pair- 
ing and dependence in the location and spread of the bootstrap distributions of 9. We can also 
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notice the influence of these assumptions in the point estimators and bootstrap confidence in- 
tervals shown in Tabled! In this case, not including these assumptions leads to underestimating 
9. 




Figure 1 : (a) Contour plot: sinh-arcsinh distribution; (b) Simulated data: bootstrap distributions of 6 
using different estimators; "Independent" (bold-dashed line), "Paired" (bold line), "Kernel" (solid line), 
"ECDF" (dashed line), "MLE" (dotted line), "SMLE" (dotted-dashed line). 



Estimator 


e 


Normal 


Basic 


Percentile 


BCa 


Independent 


0.65 


(0.560,0.724) 


(0.559,0.723) 


(0.568,0.732) 


(0.562,0.727) 


Paired 


0.65 


(0.606,0.695) 


(0.606,0.696) 


(0.607,0.697) 


(0.604,0.694) 


Kernel 


0.76 


(0.690,0.825) 


(0.692,0.827) 


(0.690,0.824) 


(0.684,0.819) 


ECDF 


0.81 


(0.734,0.886) 


(0.740,0.890) 


(0.730,0.880) 


(0.720,0.870) 


MLE 


0.78 


(0.707,0.853) 


(0.709,0.854) 


(0.705,0.850) 


(0.701,0.847) 


SMLE 


0.77 


(0.704,0.844) 


(0.707,0.847) 


(0.694,0.835) 


(0.694,0.835) 



Table 1 : Simulated data: Estimators and 95% bootstrap confidence intervals. 
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3.2 Real data 



In this section we study the data set presented in lVenkatraman and Beggl (|1996h . which contains 
72 lesion scores obtained using both a clinical scheme without a dermoscope (X Test), and a 
dermoscopic scoring scheme (Y Test). Their main interest is to assess the information pro- 
vided by the use of the dermoscope. Here, we analyse the subset of 51 non-diseased patients 
(diagnosed using a biopsy) and compare the nonparametric inferences for 9 obtained under 
three assumptions: independence, pairing and independence, and dependence of the tests us- 
ing the estimators described in the introduction of this section. It is important to note that the 
population correlation coefficient of this sample is 0.794, which suggests that the entries are 
correlated. 

Table|2]shows point estimators and four types of bootstrap confidence intervals of 9. Figure 
|2] shows the bootstrap distributions of 9 corresponding to the models described in Table [2l 
We can note a discrepancy of the point estimators under the assumptions of dependence and 
independence of the tests. Interval inference is also different; in the cases where pairing and 
dependence are not considered we can note that the value 9 = 0.5 is included in some of 
the bootstrap confidence intervals, leading to dif ferent conclusions abou t the relationship of 
the tests. This is in line with the conclusions in iRubio and Steell (|2012l) and emphasises the 
importance of the dependence and pairing assumptions. 



Estimator 


9 


Normal 


Basic 


Percentile 


BCa 


Independent 


0.55 


(0.469,0.678) 


(0.467,0.672) 


(0.450,0.656) 


(0.474,0.691) 


Paired 


0.55 


(0.498,0.597) 


(0.497,0.596) 


(0.501,0.601) 


(0.499,0.598) 


Kernel 


0.63 


(0.5245,0.737) 


(0.525,0.738) 


(0.528,0.741) 


(0.519,0.732) 


ECDF 


0.69 


(0.559,0.813) 


(0.569,0.823) 


(0.549,0.804) 


(0.529,0.784) 


MLE 


0.65 


(0.543,0.776) 


(0.544,0.776) 


(0.532,0.765) 


(0.537,0.768) 


SMLE 


0.64 


(0.538,0.756) 


(0.539,0.757) 


(0.527,0.744) 


(0.533,0.749) 



Table 2: Melanoma data: Estimators and 95% bootstrap confidence intervals. 
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Figure 2: Melanoma data: bootstrap distributions of 6 using different estimators; "Independent" (bold- 
dashed line), "Paired" (bold line), "Kernel" (solid line), "ECDF" (dashed line), "MLE" (dotted line), 
"SMLE" (dotted-dashed line). 

4 Discussion 

We presented a class of nonparametric estimators for 9 = P(X < Y) for the case of paired, 
possibly dependent, observations. This class of estimators avoids making assumptions on the 
distribution and the dependence structure of (X, Y), which are implicitly included in the esti- 
mation by modelling the differences of the observations. Confidence intervals for 9, based on 
these estimators, can be obtained using bootstrap methods which are easy to implement in R. 
It was illustrated, using a real data set, that not accounting for these assumptions might lead 
to opposite conclusions about 9 = 0.5, and consequently about the relationship between the 
variables X and Y. 

A possible extension of this work consists of estimating 9 in the context of censored and 
missing observations. The ideas presented here can be extended to these scenarios by using 
that 




fx,y(x,y)dxdy, 



and replacing the joint density fx,y with a nonparametric density estima tor. The use of kernel 
density est imators in these cont exts has been studied, for example, in iTitterington and Mill 



(119831) and 



Wells and Yeo 



1996) 
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